Multi-source face tracking with audio and visual data

نویسندگان

  • Ce Wang
  • Michael S. Brandstein
چکیده

A real-time, face tracker based on both sound and visual cues is presented. Initial talker locations are estimated acoustically from microphone array data while precise localization and tracking are derived from visual data. The image processing employs a hierarchical structure which utilizes source motion, contour geometry, color data, and facial features. The resulting system is capable of tracking multiple people in complex backgrounds and robustly discriminating faces from similar objects. While the direct focus of this work is automated video conferencing, the face tracking capability has utility to many multimedia and virtual reality applications.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speaker Tracking Using an Audio-visual Particle Filter

We present an approach for tracking a lecturer during the course of his speech. We use features from multiple cameras and microphones, and process them in a joint particle filter framework. The filter performs sampled projections of 3D location hypotheses and scores them using features from both audio and video. On the video side, the features are based on foreground segmentation, multi-view fa...

متن کامل

An embedded audio-visual tracking and speech purification system on a dual-core processor platform

Design of an embedded audio–visual tracking and speech purification system is described in this paper. The system is able to perform human face tracking, voice activity detection, sound source direction estimation, and speech enhancement in real-time. Estimating the sound source directions helps to initialize the human face tracking module when the target changes the direction. The implementati...

متن کامل

Audiovisual-based adaptive speaker identification

An adaptive speaker identification system is presented in this paper, which aims to recognize speakers in feature films by exploiting both audio and visual cues. Specifically, the audio source is first analyzed to identify speakers using a likelihood-based approach. Meanwhile, the visual source is parsed to recognize talking faces using face detection/recognition and mouth tracking techniques. ...

متن کامل

An Audio-Visual Particle Filter for Speaker Tracking on the CLEAR'06 Evaluation Dataset

We present an approach for tracking a lecturer during the course of his speech. We use features from multiple cameras and microphones, and process them in a joint particle filter framework. The filter performs sampled projections of 3D location hypotheses and scores them using features from both audio and video. On the video side, the features are based on foreground segmentation, multi-view fa...

متن کامل

Audio-visual SPeaker localization for car navigation systems

Human-computer interaction for in-vehicle information and navigation systems is a challenging problem because of the diverse and changing acoustic environments. It is proposed that the integration of video and audio information can significantly improve dialog system performance, since the visual modality is not impacted by acoustic noise. In this paper, we propose a robust audio-visual integra...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999